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Abstract — A Causal rate distortion function with a general 
fidelity criterion is formulated on abstract alphabets and the 
optimal reconstruction kernel is derived, which consists of a 
product of causal kernels. In the process, general abstract 
spaces are introduced to show existence of the minimizing 
kernel using weak* -convergence. Certain properties of the 
causal rate distortion function are presented. 

I. INTRODUCTION 

This paper is concerned with lossy data compression subject 
to distortion or fidelity criterion and causal decoding on 
abstract alphabets. Its information theoretic interpretation 
is the causal rate distortion function formulated via the 
directed information between the source sequence X" = 

A 



{Xo, Xi, 
{Yo,Yi,. 



■ ,Xn} and its reproduction sequence F" 
, Yn} defined by 



(1) 



1=0 



The average distortion constraint is 



£;{do,„(x",y")} < D, do,„(x",y") = J2poA^\f) 



(2) 



where D > 0, (io,n(-, •) a non-negative distortion function 
Define the causal product of conditional distributions by 



oPYi\Y^-'^,X' 



(3) 



where PY^\Y'-\X'{dy,\y^ ^ 
distribution of Yi given (F*" 
Since causal codes as 



Xi\Xi 



, x') denotes the conditional 
i = 0,1, ...,n. 
defined in [4] satisfy 



'x,\x^-iidx,\x'-^). P 
a.s (see also Lemma |Z4] |. in the analysis it is convenient to 
express /(X" F") as a functional of ~^ Y^\x^{dy"'\x"') 
as follows. 

^y,.|Y„(dw"|a;") 
liX^ )■ Y^^ / ^ \ \ J ^ 



where I(Px'>, indicates the functional dependence 

of /(X" ^ r") on {Px", i^y-ix"}- 
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xfY'^\X'^idy"\x")PxAdxn (4) 

= (5) 



The causal information rate distortion function investigated 

is 



inf I{X" 

'P^„^x^{dv"\x^)■.E{do,r^{X".Y")}<D 



Y") 



(6) 



Under appropriate assumptions on (io.n(-, ■) it is shown that 
the optimal causal product (reproduction channel) i^^ 
which achieves the infimum in (|6]l is given by 



H=0 



Y"\X" 

e^M-'^y')pp^^y^Jdy,\f~^) 

(7) 



where s < is the Lagrange multiplier associated with 
the fideUty constraint. The operational meaning of (|6]l is 
shown in [5] via coding theorems (called sequential code), 
hence this aspect will not be discussed. Rather, the main 
emphasis of the paper is the mathematical formulation, the 
prove of existence of solution to (|6]l, the derivation of (|7]l, 
the derivation of a closed form expression for the causal rate 
distortion function, and some of its properties. 
The Shannon source code consists of an encoder-decoder 
pair The encoder observes a source sequence X°° = 
{Xq,Xi,...} and generates a compressed representation 
{Zq, Zi, . . .}. The decoder upon observing the representation 
sequence {Zq,Zi,...} generates a reproduction sequence 
Yi = fi{X°°) of Xi, for every time step i. The dependence 
of the reproduction sequence on the future source symbols, 
in addition to its past and present symbols makes such a 
decoder non-causal. In Neuhoff and Gilbert [4], a source 
code is defined as causal if the reproduction sequence is 
such that = fiiX°°) whenever X' = X\ Vi = 

0, 1, . . .. The definition of a causal code necessitates that any 
information theoretic causal rate distortion function should 
lead to an optimal reconstruction conditional distribution 
which is causally dependent on the source symbols, and (|7]l 
has this property. 

The classical rate distortion function is defined via the mutual 
information between X" and F", namely, with 
average distortion (|2}, and the code is assumed non-causal, 
leading to the well known optimal reconstruction [1], [3] 



Since by chain rule Py 



e«Er=oPo-(^'.!/")p^„(dy") 



(8) 



.4dy^\y'~' = y'~\X^ = x"), the 

classical rate distortion theory gives a reconstruction Yi = yi 
which depends on future values of the source symbols. 



= Xi+i,...,Xn — Xn) in addition to its past 
reconstructions Y^^^ = y*^^, and past and present source 
symbols X* — a;*. The point to be made here is that, in 
general, aside from some special examples, such as the i.i.d 
source and single letter distortion (io,n = J2i=o Pii^i^Vi) 
[2] the reconstruction conditional distribution and hence the 
decoder of the classical rate distortion function is non-causal. 
On the other hand, a code is causal if the reconstruction 
distribution is causal. 

II. PROBLEM FORMULATION 
In this section, we introduce the set up of the problem 
on discrete time sets N" = {0,1,..., n}, n ^ N — 
{0,1,2,...}. Assume all processes are defined on a com- 
plete probability space (fi, P) with filtration {J^f}f>o- 
The source and reconstruction alphabets are sequences of 
PoUsh spaces [11] {Xt : t e N} and {Vt : t e N}, 
respectively, (e.g., yt,Xt are complete separable metric 
spaces), associated with their corresponding measurable 
spaces {Xf,B{Xt)) and (3^t,S(3^t)) (e-g-, B{Xt) is a Borel 
CT— algebra of subsets of the set Xt generated by closed 
sets), t E N. Sequences of alphabets are identified with 
the product spaces (Ab,™, i3(A'o^„)) = 'x'k=()i'''^k, B{Xk)), 
and (3^o,„,S(3^o,n)) = xl^o(yk,l3{yk)). The source and 
reconstruction are processes denoted by X" = {Xt : t £ 
N"}, X : N"" X n ^ Xt, and by V ^ {Yt : t e N"}, 
Y : N" xfl yt, respectively. Probability measures on any 
measurable space {Z,B{Z)) are denoted by It is 
assumed that the cr-algebras cr{X"^} = a{Y^^} = {0,17}. 

Definition 2.1: Let {X,B{X)),{y,B{y)) be measurable 
spaces in which 3^ is a Polish Space. 

A stochastic Kernel on y given A" is a mapping q ; B{y) x 
X — > [0, 1] satisfying the following two properties: 

1) For every x E X, the set function q{-;x) is a probability 
measure (possibly finitely additive) on B{y). 

2) For every F G B{y), the function q{F; ■) is B{X)- 
measurable. 

The set of all such stochastic Kernels is denoted by Q{y; X). 

An important notion is conditional independence. The 
Random Variable (R.V.) Z is called conditional independent 
of R.V. X given the R.V. Y if and only if X ^ Y ^ Z 
forms a Markov chain in both directions. 

Stochastic kernels can be used to define non-causal and 
causal product reconstruction kernels and associated rate 
distortion functions. 

Definition 2.2: Given measurable spaces {Xo^n, B{Xo,n)), 
{yo.n, B{yo^n)), and their product spaces, data compression 
channels are defined as follows. 

1) A Non-Causal Data Compression Channel is a stochas- 
tic kernel qo,n{dy''; x"") G 2(3^0,™; Ab,™), n e N. 

2) A Causal Product Data Compression Channel is a 
product of a sequence of causal stochastic kernels 
defined by 

to,„(dy";a;") = ®'l^^q^{dv,■y-~\x-) 

where qi G Q(3^i; 3^o,j-i x Xo^i), i^O,...,n, n G N. 



Note that classical rate distortion theory is concerned with 
finding the optimal Pyn^x^idy'^lX" = x"), which is gen- 
erally non-causal, while in this paper the interest is to find 
the optimal causal product kernel. 

A. Causal and Classical Rate Distortion Functions 

In this section the classical rate distortion function which 
has a non-causal structure is reviewed, and then the causal 
rate distortion function is defined. 

Given a source probability measure /^o.n G ■Mi{Xo,n) 
(possibly finite additive) and a reconstruction Kernel go.n G 
2(3^0, n; Ab^n), one can define three probability measures as 
follows. 

(PI): The joint measure Po,n G A^i(3^o,n x Ao^n): 

)(Go,„), Go,„ G B(XQ^n) X 6(3^0, n 

(Go ; x"')fio,n{dx"') 

where GQ,n,x" is the x"— section of Go,n at point defined 

by Go,„,a;" = {y" e 3^o,n : (a;",y") G Go,„} and (g) denotes 
the convolution. 

(P2): The marginal measure i/g.n G A^i(3^o,n): 

fO,n(^0,n) — Po,n{Xo,7i X -Fo^„), Fo^„ G S(3^o,n) 

{{Xo^n X FQ^n)x'-;x"')lJ'0,n{dx") 



Xo., 



Xo., 



90,n 



(P3): The product measure 7ro^„ : B{XQ^n) x yS(3^o,n) ^ 
[0, 1] of /xo,„ G A^i(A'o,„) and i/o,„ G A^i(3^o,«): 



A 



7rO,n(Go,„) — (/iO,n X I'o.n) (Go,„ ) , Go,„ G S(A'o,„) X S(3^0,i 

^0,n (Go,„,a;")Mo,«(rfa:^") 



Xo.^ 

The precise definition of mutual information between two 
sequences of Random Variables X" and y", denoted 
is defined via the Kullback-Leibler distance (or 
relative entropy) between the joint probability distribution of 
(X", y") and the product of its marginal probability distri- 
butions of X" and y", using the Radon-Nikodym derivative. 
Hence, by the construction of probability measures (P1)-(P3), 
and the chain rule of relative entropy [11]: 

/(X";y") =D(Po,n|ko,n) (9) 

[ 1 (d{lJ.o.n®qo,n)\,, ^ s 

= / log Ti 3 V '^(^O'" ® ^0." 

= / ,,^(s^Adr^y 

Jxo,„xyo,„ ^ VQ^n{dy^) 
qo,n{dy";dx")no,n{dx") 

{qQ.n{■^,x"')\\lyo,n{■))^J.O.n{dx''^) 

Xo,„ 

= I(^0,«;<70,ri) (10) 

Note that (fTOt states that mutual information is expressed as 
a functional of {/Lto,n, qo,n} and it is denoted by i{lJ.o,n', qo,n)- 



Note that necessary and sufficient conditions for existence of 
a Radon-Nikodym derivative for finitely additive measures 
can be found in [13]. Moreover, /(X"; F") is also expressed 
by the sum of two directed information as follows 

= + /(X"^r") (ii) 

where 

n 

= ^/(x^r,|r'-i) (i2) 

i=Q 
n 

(13) 

1=0 

Definition 2.3: (Classical Rate Distortion Function) Let 
do,n ■■ Xo^n X 3^o,ri ^ [0, oo), be an i3(<Yo,„) x 
B(3^o.n)-ineasurable distortion function, and let Qa.n{D) C 
Q(3^o.ri; '^o,ri) (assuming is non-empty) denotes the average 
distortion or fidelity constraint defined by 

QaAD) = [qa^n e Q(3^o,«; Ab,™) : 

Mo,n(dx") D>0 (14) 

The classical rate distortion function associated with the non- 
causal kernel qo^n G Q(3^o.n; '^o.n) is defined by 

Ro,n{D) ^ inf — ^I(/xo,„;go,n) (15) 

while its operational meaning can be established via 

limsup„^oo-Ro,n- 

Existence in ( fTsT i is shown assuming do^nix"; •) is bounded 
continuous on yo,n and yo,n is compact, using weak- 
convergence of probability measures in [3], and for more 
general do_„(a;"; •) which is only continuous in J^o.n using 
weak*-convergence of measures [14] on Polish spaces. 
A version of the optimal reconstruction kernel which attains 
the infimum in ( fTSl l, [3] is 

where i/q „ e Mi{yo.n) is the marginal of Pq*„ = /xo,n ® 
(7o „ G 1 ( Ao.n X 3^0, n) and s < is the Lagrange multiplier 
associated with the fidelity constraint (fT4|) . Unfortunately, 
for general sources and distortion function c?o,n, the opti- 
mal reconstruction ^((iy"; a;") = ®i=oqi{dyi;y^~^,x^) 
is non-causal and introduces delay in the reconstruction 
processes. On the other hand, if the solution (fTSI l gives a 
reconstruction such that „((iy"; x") = "^q „((i?/"; x") = 
^?=o'li {dyi', y^~^ , x'^) it will be causal. However, there are 
only limited examples in which ^T6\i is causal on the source 
sequence. For single letter distortion function do^n{x'^ ,y"') — 
-^J27=oPii^i'y^) independent sources fio^dx"^) = 
^f^ofJ-i{dx'^) (e.g., {Xi : i G N} are independent) the 
optimal reconstruction qQ^{dy";x^) factors into a product 
of causal kernels go_„((iy"; x") = '^'^=Qqi{dyi,Xi) [2]. 



This raises the question whether the classical rate distor- 
tion function can be reformulated using the causal product 

The next lemma relates causal product reconstruction 
kernels, mutual information, directed information, and con- 
ditional independence. 

Lemma 2.4: The following are equivalent for each n G N. 

1) qo,n{dy"';x") = "^q x"), as defined in Defini- 
tion IZ2l2) 

2) For each i ^ 0,1, . . . ,n - 1, Y, ^ {X\ Y'-^) o 
{Xi^i, ■ ■ ■ , Xn), forms a Markov chain 

3) I{X";Y") = I{X" ^Y") 

4) I{X" ^ r") = 

5) For each i = 0, 1, . . . , n - 1, o X' o Xi+i forms 
a Markov chain 

Proof. Omitted due to space limitation. 
According to Lemma l274l anv source with a satisfying condi- 
tional distribution Pxi|X'-i,y»-i = x'^^,F*^^ = 
yi-i) = Px^ix'-iidxi\X'-^ = x'-i), P - a.s., Vi G N is 
equivalent to any of the equivalent statements of Lemma l24l 
Therefore, for such a source the mutual information becomes 

^o.n(d2;";rf2;")Aio,n(rf2:") (17) 

= IWn;^0,n) (18) 

where ( fTsT i states that 1{X^;Y") is a functional of 
{/io,n, ^0 nl- Hence, causal rate distortion is defined by op- 
timizing I(/xo.„; ito.n) over „ which satisfies a distortion 
constraint. 

Definition 2.5: (Causal Rate Distortion Function) Sup- 
pose do^n = LiLo /°o,i(a;%2/*), where pQ.^ ; Xo^i x ^o.i ^ 
[0, oo), is a sequence of B(A'o.i) x Z5(3^o.i)-measurable dis- 
tortion functions, and let ~C^o.n{D) (assuming is non-empty) 
denotes the average distortion or fidelity constraint defined 
by 

^~rZl / / poAx\y")to4dy";x') 

fioAdx') D>0 (19) 

The causal rate distortion function associated with the causal 
product kernel „ G ~(^q is defined by 

Ro.n{D)^ inf ^—1(^0 ;^o„) (20) 
to,„e^o,,.(D) " + 1 

while its operational meaning can be established via 

limsup„^3o ^o,n- 

Cleai-ly, Ro,n{D) is characterized by minimizing directed 
information or equivalently I(/io,n; ^o.n) over the causal 
product measure „ G Qq „(£'). 



Lemma 2.6: l^o,n G A^i(3^o,n) is uniquely determined 
by {qi e Qi(y,; 3^0,4-1 x Xo.t)}?=Q and vice-versa, P-a.s. 
Proof. For densities this result is derived in [15]. 

III. EXISTENCE OF OPTIMAL CAUSAL PRODUCT 
RECONSTRUCTION KERNEL 

In this section, appropriate topologies and function spaces 
are employed to show existence of the minimizing causal 
product kernel in (l20l) . In the process we also show existence 
for Ro^niD). 

A. Abstract Spaces 

Let BC{yo^n) denote the vector space of bounded con- 
tinuous real valued functions defined on the Polish space 
yo^n- Furnished with the sup norm topology, this is a 
Banach space. The topological dual of -BC(3^o,n) denoted 

by (^BC{yo^n)j is isometrically isomorphic to the Banach 
space of finitely additive regular bounded signed measures 
on yo,n [7], denoted by Mrba{yo,7i)- Let Ilrba{yo,n) C 
Mrba{yo,n) denote the set of regular bounded finitely addi- 
tive probability measures on 3^o.„. Clearly if 3^0. « is compact, 

then (^BC{yo^n)j will be isometrically isomorphic to the 
space of countably additive signed measures, as in [3]. De- 
note by Li(/io,n, BC{yQ^n)) the space of all /io,n-integrable 
functions defined on Ao^„ with values in BC{yo,n), so that 
for each 4> G ii(/io,n, BC{yQ^n)) its norm is defined by 

ll0lUo,„= / \\Hx"){-)\\BC{yo,^)t^o,n{dx") < oo 

The norm topology || (p ||pn „, makes Lii^Q^n, BC{yQ^n)) a 
Banach space, and it follows from the theory of "lifting" [10] 
that the dual of this space is L^(/xo,„, Mrba{yo,n)), denoting 
the space of all Mrba{yo.n) valued functions {q} which are 
weak* -measurable in the sense that for each cf) e BC{yo,n), 

a;" — > qx^{(t)) = Jy^^(t){y'^)q{dy"']x"') is ^o,n-measurable 
and /xo.„-essentially bounded. 

B. Weak* -Compactness and Existence 

Define an admissible set of stochastic kernels associated 
with classical rate distortion function by 

Qad ~ L'^^{pQ^n,^rba{yQ,n)) C (Mo,n , ^'-^r&a (!Vo,n ) ) 

Clearly, Qad is a unit sphere in (/io,n, MrbaC^o,™))- 
For each (j>&Li{iiQ^n, BC{yQ^n)) we can define a hnear 
functional on L^{fio,n, Mrba{yo,n)) by 

go,n(d2/";a;"))/io,„(da;") 
This is a bounded, linear and weak* -continuous functional 

on L^{fJ.O,n,Mrba{yo,n})- For do,n ■ ^0,n X 3^o,n 

[0, oo) measurable and (io.nG-^i(Mo,Ti; -BC(3^o,n)) the distor- 
tion constraint set of the classical rate distortion function is 

QoAD) = {qeQad ■■ -^—rido Aqo,n)<D} 

_1_ ' 



It can be shown that Qo,n{D) is bounded and weak*-closed 
subset of Qad and hence weak* -compact (Compactness of 
Qad follows from Alaoglu's Theorem [7], [12]). 
Next, we define the set of causal product kernels as follows. 

qddy,;f~\x') enrba{y^), leN"} 

where (/xo,„, rfrf)a(yo,ri)) denotes the space of all 
I?r6o(3^o,n) valucd functions {^} which are weak*- 
measurable in the sense that for each (p G BC{yo,n), a;" ^• 

''tx"i4') = Jyg 4'{y")~<t {dy"", x^^) is /xo,„-measurable and 
/io,r!-essentially bounded. 

Define the admissible set of causal product stochastic kernels 
associated with the causal rate distortion function by 

A = L^{nQ^n,^rba{yQ,n)) 

Clearly, C^ad = {^o.ri e Qad ■ qa,„{dy";x") = 
'?'o.ri(rf2/";a;")}. For do,n : -^o.n x 3^o,n [0, oo) which 
is measurable and do,„eLi(/Lto,n, i?C(yo,n)) the distortion 
constraint of causal rate distortion function is 

'C^O,n{D) = [~ta.n e ^ad ■ 

-^id„J-to,n)= [ (I do,n{x\yn 

-t^Ady^■x^)^^i^Adx^)<D] 

Assumptions 3.1: We make the following assumptions. 

1) The set dad is weak*-closed. 

2) The set Qo.n{D) is non-empty. 

Lemma 3.2: Suppose Assumptions 13.11 hold. Let 
Ao,„,3^o,n be two Polish spaces and do.n : "^o.n x 3^o,n 
[0,oo], a measurable, non-negative, extended real valued 
function, such that (io.nGii(/^o.n, ^^(yo.ri))- For any 
D e [0, oo), the set ~^Q.n(D) is weak* -compact. 
Proof. By Assumptions 13.11 ~(^ad is a weak* -closed, hence 
as a subset of a weak* -compact set Qad it is weak* -compact. 
Also, under assumptions 13.11 ~C^a.n{D) is bounded and 
weak* -closed and hence it is weak* -compact (as a weak*- 
closed subset of the weak*-compact set Q ad) • 

Theorem 3.3: Under Assumptions 13.11 Hci^D) has a 
minimum. 

Proof. Follows from Lemma 13.21 and the lower semi- 
continuity of I(Ato,n; •) on Qad • 

IV. NECESSARY CONDITIONS OF OPTIMALITY 
OF CAUSAL PRODUCT RATE DISTORTION 
FUNCTION 

In this section the form of the optimal causal product 
reconstruction kernels is derived. The method is based on 
calculus of variations on the space of measures [9]. 

Theorem 4.1: Suppose I^o.nC^o,™) = 1(^*0,™; "^0,71) is 
well defined for every Vo,« ^ ^J^i>(Mo,n, rf^b^O'o,™)) pos- 
sibly taking values from the set [0, 00]. Then n ~* 



I/JonC^On) is Gateaux differentiable at every point in 
iJ^i(Mo,n, I?rba(^o,«)), and the Gateaux derivative at the 
point „ in the direction ito.n ^ n given by 

^^Mo.n C'to.nJ ~ia,n ~ Vo,Ti) 





/ log ( 







(to.„-ta„)(dy";x")/io.„(dx") 

where „ e A^i(3^o,n) is the marginal measure correspond- 
ing to „ ® ^o,n(da;") e A^l(yo,n X X^.n)- 

Proof. The proof is based on the fact that the causal product 
stochastic kernel „ is used to show the existence of 
Gateaux Differential [9] rather than for individual causal 
stochastic kernel qi{dyi;y^^^ ,x^), i G N" • 

The constrained problem defined by (l2Ut can be reformu- 
lated using Lagrange multipliers as follows (equivalence of 
constrained and unconstrained problems follows from [9]). 

^o,n(-D) = inf |^-i-II(/io,n; Vo,«) 



(21) 



and s e (—00,0] is the Lagrange multiplier 

Theorem 4.2: Suppose do,„(x",2/") = I]"=o /°o,i(a;% 
and the assumptions of Lemma 13.21 hold. The infimum in 
is attained at "^;^ „ e iJ^(Mo,n, I?r6a(3^o.n)) given by 

and v*{dyi;y^^^) G Q(X; 3^o,j-i)- The causal rate distor- 
tion function is given by 

Ro,n{D) = sD - -—Y^ / 



(23) 



If ^o,ri(£') > then s < and 



1 



,1^/ / PoAx\ylt*o.i{dy']x')fioAdx') ^ D 

Proof. The fully unconstraint problem of (1211 is obtained 
by introducing another Lagrange multiplier Using this and 
Theorem 14. 1 1 we obtain (|22]| and (l23Tl • 



V. PROPERTIES OF CAUSAL RATE DISTORTION 
FUNCTION 

In this section, we present some important properties of 
the causal rate distortion function as it is defined in (l20t . 
Theorem 5.1: 

1) ^ o,n{D) is a convex, non-increasing function of D 

2) If po,i G i^(7ro,i) then 

a) ^o,n(;^Zl"=o-^'ro,i(Po,i)) = 0; 



b) ~^id.n{D) is non-increasing for Z) G [0,-Dmaa;] 
where D^aa; = I]"=o -^^c (/^o^«) ^"'^ 

^o.nC^*) = for any D > D^ax 
3) to,n{D) > for all D < D^ax and i^o,n(^) = 
for all D > D„iax, where 

1 " f 

Dmax = min — TT ^1 / Po,i(x\y')t^o.i{dx'') 
{v-yeya.^n+lf^Jxo,, 

if such a minimum exists. 
Proof. Omitted due to space limitation. 

VI. CONCLUSION AND FUTURE WORK 

A. Conclusion 

The solution of the causal rate distortion function subject 
to a reproduction kernel which is a product of causal kernels 
is presented, on abstract alphabets. Some of its properties are 
also presented. It is believed that the optimal reconstruction 
kernel as a product of causal kernels has several implications 
in applications where causality of the decoder as a function 
of the source is of concern. 

B. Future Work 

Examples are currently under investigation, and will be 
presented at the final version of the paper 

VII. APPENDIX 
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